Search CORE

183 research outputs found

Evaluation and selection of models for out-of-sample prediction when the sample size is small relative to the complexity of the data-generating process

Author: Leeb Hannes
Publication venue: 'Bernoulli Society for Mathematical Statistics and Probability'
Publication date: 24/10/2008
Field of study

In regression with random design, we study the problem of selecting a model that performs well for out-of-sample prediction. We do not assume that any of the candidate models under consideration are correct. Our analysis is based on explicit finite-sample results. Our main findings differ from those of other analyses that are based on traditional large-sample limit approximations because we consider a situation where the sample size is small relative to the complexity of the data-generating process, in the sense that the number of parameters in a `good' model is of the same order as sample size. Also, we allow for the case where the number of candidate models is (much) larger than sample size.Comment: Published in at http://dx.doi.org/10.3150/08-BEJ127 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

arXiv.org e-Print Archive

Crossref

Conditional predictive inference post model selection

Author: Leeb Hannes
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2009
Field of study

We give a finite-sample analysis of predictive inference procedures after model selection in regression with random design. The analysis is focused on a statistically challenging scenario where the number of potentially important explanatory variables can be infinite, where no regularity conditions are imposed on unknown parameters, where the number of explanatory variables in a "good" model can be of the same order as sample size and where the number of candidate models can be of larger order than sample size. The performance of inference procedures is evaluated conditional on the training sample. Under weak conditions on only the number of candidate models and on their complexity, and uniformly over all data-generating processes under consideration, we show that a certain prediction interval is approximately valid and short with high probability in finite samples, in the sense that its actual coverage probability is close to the nominal one and in the sense that its length is close to the length of an infeasible interval that is constructed by actually knowing the "best" candidate model. Similar results are shown to hold for predictive inference procedures other than prediction intervals like, for example, tests of whether a future response will lie above or below a given threshold.Comment: Published in at http://dx.doi.org/10.1214/08-AOS660 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Permanent Hosting, Archiving and Indexing of Digital Resources and Assets

The distribution of a linear predictor after model selection: Unconditional finite-sample distributions and asymptotic approximations

Author: Leeb Hannes
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2006
Field of study

We analyze the (unconditional) distribution of a linear predictor that is constructed after a data-driven model selection step in a linear regression model. First, we derive the exact finite-sample cumulative distribution function (cdf) of the linear predictor, and a simple approximation to this (complicated) cdf. We then analyze the large-sample limit behavior of these cdfs, in the fixed-parameter case and under local alternatives.Comment: Published at http://dx.doi.org/10.1214/074921706000000518 in the IMS Lecture Notes--Monograph Series (http://www.imstat.org/publications/lecnotes.htm) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Can one estimate the conditional distribution of post-model-selection estimators?

Author: Leeb Hannes
Pötscher Benedikt M.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/04/2006
Field of study

We consider the problem of estimating the conditional distribution of a post-model-selection estimator where the conditioning is on the selected model. The notion of a post-model-selection estimator here refers to the combined procedure resulting from first selecting a model (e.g., by a model selection criterion such as AIC or by a hypothesis testing procedure) and then estimating the parameters in the selected model (e.g., by least-squares or maximum likelihood), all based on the same data set. We show that it is impossible to estimate this distribution with reasonable accuracy even asymptotically. In particular, we show that no estimator for this distribution can be uniformly consistent (not even locally). This follows as a corollary to (local) minimax lower bounds on the performance of estimators for this distribution. Similar impossibility results are also obtained for the conditional distribution of linear functions (e.g., predictors) of the post-model-selection estimator.Comment: Published at http://dx.doi.org/10.1214/009053606000000821 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Permanent Hosting, Archiving and Indexing of Digital Resources and Assets